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A METHOD OF SYNCHRONIZING TWO DIGITAL DATA STREAMS WITH 
THE SAME CONTENT 

The invention relates to a method of synchronizing 
two digital data streams with the same content, for 
5 example a reference stream transmitted by a broadcasting 
system and the received stream, which may be degraded, 
the method being usable in particular to evaluate 
transmission quality. 



field of broadcasting audiovisual signals has opened up 
new prospects and means that users may be offered more 
services . 

The signals are modified during the various stages 
15 of broadcasting them because technical constraints 

imposed, for example in terms of bit rate or bandwidth, 
cause characteristic deterioration during difficult 
transmission conditions . 



have been developed for this purpose. Most of them are 
25 based on comparing the signal present at the input of the 
system under test, which is called the reference signal, 
with the signal obtained at the output of the system, 
which is called the degraded signal. Certain ^^reduced 
reference'' methods compare numbers calculated for the 
30 reference signal and for the degraded signal instead of 
using the signal samples directly. In both cases, in 
order to evaluate quality by means of a comparison 
technique, it is necessary to synchronize the signals in 
t ime . 

35 Figure 1 depicts the general principle of these 



BACKGROUND OF THE INVENTION 
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The introduction of digital technology into the 
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To be able to provide a quality assured service, it 
is necessary to develop tools and instruments for 
measuring the quality of the signals and, where 
applicable, for estimating the magnitude of the 
deterioration that has occurred- Many measuring methods 



methods . 



Although synchronization of the signals may be 
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easily achieved in simulation or when the system under 
test is small, for example a coder-decoder (codec) , and 
not geographically distributed, this is not the case in a 
complex system, in particular in the situation of 
5 monitoring a broadcast network. Thus the synchronization 
step of quality measuring algorithms is often critical. 

In addition to applications for measuring quality in 
a broadcast network, the method described herein is 
applicable whenever temporal synchronization between two 

10 audio and/or video signals is required, in particular in 
the context of a distributed and extended system- 
Various techniques may be used to synchronize 
digital signals in time. The objective is to establish a 
correspondence between a portion of the degraded signal Sd 

15 and a portion of the reference signal Sr. Figure 2 
depicts this in the case of two audio signals. The 
problem is to determine a shift DEC that will synchronize 
the signals. 

In the case of an audio signal, the portion (or 
20 element) for which a correspondence has to be established 
is a time window, i.e. a period of the signal with an 
arbitrary duration T. 

The existing methods may be divided into three 
classes : 

25 • Correlation approach in the time domain: This is 

the most usual approach and consists in comparing samples 
of the two audio signals Sr and Sd to be synchronized, 
based on their content. Thus the normalized 
intercorrelation function between Sr and Sd, for example, 

30 looks for the maximum resemblance over a given time 

period T, for example plus or minus 60 ms, i.e. a total 
period of 120 ms . The accuracy of synchronization 
obtained is potentially to the nearest sample . 

• Correlation approach in the time domain using 

35 marker signals: methods that use this principle seek to 
overcome the necessity for significant variations in the 
signal. To this end, a specific marker signal designed 
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to allow robust synchronization is inserted into the 
audio signal Sr. Thus exactly the same intercorrelation 
method may be applied to the marker signals extracted 
from the signals Sr and So to be synchronized, which in 
5 theory allows robust synchronization regardless of the 
content of the audio signal. 

In order to use this method, the marker signal must 
be inserted in such a way that the modification of the 
content of the audio signal is as imperceptible as 

10 possible. Several techniques may be used to insert 
marker signals or other specific patterns, including 
^Watermarking" . 

• Synchronization using temporal markers: methods of 
this class are usable only if the signals are associated 

15 with temporal markers. Thus the method relies on 

identifying, for each marker of the reference signal, the 
nearest marker in the series of markers associated with 
the degraded signal. 

A powerful signal synchronization method is 

20 characterized by a compromise between: 

- its accuracy, i.e. the maximum error that occurs 
on synchronizing two signals (in particular, the method 
may be sensitive to the content of the signals) , 

- its calculation complexity, and 

25 - finally, the volume of data necessary for 

effecting the synchronization. 

The main drawback of the techniques most usually 
employed (using the correlation approach referred to 
above) is the calculation power that is necessary, which 

30 becomes very high as the search period T increases (see 
Figure 2) . Another major drawback is the necessity for 
the content to evolve significantly and continuously. 
Depending on the type of signals analyzed, this is not 
always achieved. The content of the signals therefore 

35 has a direct influence on the performance of the method. 
Moreover, to utilize this type of approach on complete 
temporal signals, it is necessary to have both the 
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signals Sr and Sd available at the comparison point; this 
is a very severe constraint that is impossible to satisfy 
in some applications^ such as monitoring an operational 
broadcasting network . 
5 A feature of the second approach (using correlation 

with marker signals) is the modification of the content 
of the audio signal resulting from inserting the marker 
signals, with no guarantee as to how this will impact on 
quality; the measurement method therefore influences the 

10 measurement itself. Regardless of the performance 

achieved in terms of synchronizing the two signals, this 
approach is not always suitable for a real quality 
evaluation application . 

Finally, the major drawback of synchronization using 

15 temporal markers is the necessity to provide the temporal 
markers. Because the accuracy of the temporal markers is 
not always satisfactory, only a few applications are able 
to use a technique of this kind. 

In the context of broadcast network monitoring^ and 

20 because of the multiple constraints that apply to the 
signals transported and the multiple equipments the 
signals pass through (coders, multiplexers, 
transmultiplexers , decoders, etc.)^ there is no strict 
relationship between the audio signals and the temporal 

25 markers. Thus this solution does not achieve the 

necessary accuracy for a quality measuring application 
using a reference. 

OBJECTS AND SUMMARY OF THE INVENTION 
An object of the present invention is to define a 

30 method of achieving synchronization with a chosen level 
of accuracy, of lower complexity than existing methods, 
and combining the advantages of several approaches. 
^^Coarse" synchronization in accordance with the invention 
delimits an error range whose duration is compatible with 

35 the subsequent use of standard '^fine" synchronization 
methods if extreme accuracy is required. 

The novelty of the proposed method is that it 
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achieves synchronization on the basis of at least one 
characteristic parameter that is calculated from the 
signals So and Sr and defines a multidimensional 
trajectory, from which the synchronization of the signals 
5 themselves is deduced. Because this method uses the 

temporal content of the signals, the content must vary 
continuously to ensure optimum synchronization, as in the 
prior art temporal correlation methods. The advantage of 
the proposed method is that it achieves correlation using 

10 a multidimensional trajectory obtained in particular by 

combining a plurality of characteristic parameters, which 
makes it more reliable than the prior art methods. 

A fundamental advantage of the method proposed by 
the invention is that it necessitates only a small 

15 quantity of data to achieve synchronization, which is 
highly beneficial in the context of broadcast network 
monitoring. In fact, in this context, it is generally 
not possible to have the two complete signals Sr and So 
available at the same location. Consequently, it is not 

20 possible to use the standard temporal correlation 
approach. Moreover, in the context of a quality 
measurement application, the second approach using 
correlation with marker signals is not easily applicable 
because it impacts on the quality of the signals. In 

25 contrast to this, the synchronization method of the 
invention is compatible with quality measurement 
techniques based on comparing parameters calculated from 
the signals. The data representative of the 
characteristic parameter (s) is usually conveyed to the 

30 comparison points over a digital link. This digital link 
advantageously uses the same transmission channel as the 
audio signal; alternatively, a dedicated digital link may 
be used. In one particular embodiment, used in a quality 
measurement application, the data used to achieve 

35 synchronization is obtained from one or more quality 

measurement parameters. Moreover, coarse synchronization 
is obtained from data Dl and D2 calculated at intervals 
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of A = 1024 audio samples. Fine synchronization may be 
obtained from data Dl calculated at intervals of A = 1024 
audio samples and data D2 calculated at intervals of 
r < A, for example r = 32 audio samples. Thus in this 
5 case the method obtains fine synchronization that is 32 
times more accurate than the quality measurement 
parameter transmission interval. 

The method therefore integrates naturally into a 
digital television quality monitoring system in an 
10 operational broadcast network. However, it is applicable 
wherever temporal synchronization between two signals is 
required. 

Thus the proposed method achieves synchronization 
with an accuracy that may be chosen to obtain a very 

15 small range of uncertainty. It advantageously uses at 
least some of the parameters already calculated to 
evaluate the quality of the signal. The ability to start 
from an extended search period is also beneficial, 
especially as the robustness of synchronization increases 

20 with the duration of the starting period. 

The proposed method therefore does not impose the 
use of temporal markers external to the audio signals. 
The signal to be synchronized does not need to be 
modified either, which is important in a quality 

25 measurement application. 

Thus the invention provides a method of 
synchronizing two digital data streams with the same 
content, the method comprising the steps of: 

a) generating at given intervals for each of the two 
30 digital data streams Si and S2 at least two characteristic 

numbers expressing at least one parameter characteristic 
of their content; 

b) generating from said numbers points Di and D2 
associated with each of said streams and representing at 

35 least one of said characteristic parameters in a space of 
at least two dimensions, the points Di and the points D2 
that are situated in a time period T defining 



7 



trajectories representative of the data streams Si and S2 
to be synchronized; 

c) shifting the time periods of duration T assigned 
to the digital data streams Si and S2 relative to each 

5 other by calculating a criterion of superposition of said 
trajectories having an optimum value representing the 
required synchronization; 

d) choosing the shift between the time periods 
corresponding to said optimum value as a value 

10 representative of the synchronization. 

Advantageously in the method, one of the digital 
data streams is a reference stream Si, the other data 
stream is a stream S2 received via a transmission system, 
the numbers characteristic of the reference stream Si are 
15 transmitted therewith, and the numbers characteristic of 
the received stream S2 are calculated in the receiver. 

In a first variant of the method, the step c) 
entails : 

cl) calculating a distance D between a first 
20 trajectory represented by the points Di belonging to a 
first time period of duration T and a second trajectory 
represented by the points D2 belonging to a second time 
period of duration T, said distance D constituting said . 
superposition criterion; and 
25 c2) shifting said first and second time periods of 

duration T relative to each other until a minimum value 
is obtained for the distance D that constitutes said 
optimum value. 

The distance D may an arithmetic mean of the 
30 distances d, for example the Euclidean distances, between 
corresponding points Di, D2 of the two trajectories. 

In a second variant of the method, the step c) 
entails : 

cl) calculating a correlation function between 
35 corresponding points Di, D2 on the two trajectories, said 
correlation function constituting said superposition 
criterion; and 



c2) shifting said first and second time periods of 
duration T relative to each other until a minimum value 
of the correlation function is obtained that constitutes 
said optimum value. 
5 In a third variant of the method^ the step c) 

entails : 

cl) converting each trajectory into a series of 
angles between successive segments defined by the points 
of the trajectory; and 

10 c2) shifting said first and second time periods of 

duration T relative to each other until a minimum value 
is obtained for the differences between the values of 
angles obtained for homologous segments of the two 
trajectories, said minimum value constituting said 

15 optimum value. 

In the method, the step c) may entail: 
cl) converting the two trajectories into a series of 
areas intercepted by successive segments defined by the 
points of said trajectories, the total intercepted area 

20 constituting said superposition criterion; and 

c2) shifting the time periods of duration T relative 
to each other until a minimum value is obtained of said 
total intercepted area, which minimum value constitutes 
said optimum value. 

25 To make synchronization more accurate, one of said 

given intervals may be equal to A for one of the data 
streams and equal to r < A for the other data stream. 

In the method, the generation of said characteristic 
numbers for a reference audio data stream and for a 

30 transmitted audio data stream may comprise the following 
steps : 

a) calculating for each time window the spectral 
power density of the audio stream and applying to it a 
filter representative of the attenuation of the inner and 

35 middle ear to obtain a filtered spectral density; 

b) calculating individual excitations from the 
filtered spectral density using the frequency spreading 



function in the basilar scale; 

c) determining the compressed loudness from said 
individual excitations using a function modeling the non- 
linear frequency sensitivity of the ear, to obtain 

5 basilar components; and 

d) separating the basilar components into n classes, 
for example where n < 5, and preferably into three 
classes, and calculating for each class a number C 
representing the sum of the frequencies of that class, 

10 the characteristic numbers consisting of the numbers C. 
Alternatively there are n' < n characteristic numbers 
generated from said numbers C. The value chosen for n is 
much lower than the number of samples, for example 0.01 
times that number. 

15 In the method, the generation of a characteristic 

number for a reference audio data stream and for a 
transmitted audio data stream comprises the following 
steps : 

a) calculating N coefficients of a prediction filter 
20 by autoregressive modeling; and 

b) determining in each temporal window the maximum 
value of the residue as the difference between the signal 
predicted by means of the prediction filter and the audio 
signal, said maximum prediction residue value 

25 constituting one of said characteristic numbers. 

In the method, the generation of said characteristic 
numbers for a reference audio data stream and for a 
transmitted audio data stream comprises the follov/ing 
steps : 

30 a) calculating for each time window the spectral 

power density of the audio stream and applying to it a 
filter representative of the attenuation of the inner and 
middle ear to obtain a frequency spreading function in 
the basilar scale; 

35 b) calculating individual excitations from the 

frequency spreading function in the basilar scale; 

c) obtaining the compressed loudness from said 
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individual excitations using a function modeling the non- 
linear frequency sensitivity of the ear, to obtain 
basilar components; 

d) calculating from said basilar components N' 
5 prediction coefficients of a prediction. filter by 

autoregressive modeling; and 

e) generating at least one characteristic number for 
each time window from at least one of the N' prediction 
coefficients. 

10 The characteristic numbers may consist of 1 to 10 of 

said prediction coefficients and preferably 2 to 5 of 
said coefficients . 

One characteristic number for an audio signal may be 
the instantaneous power and/or the spectral power density 

15 and/or the bandwidth. 

One characteristic number for a video signal may be 
the continuous coefficient of the transformation by a 
linear and orthogonal transform of at least one portion 
of an image belonging to the data stream, said 

20 transformation being effected by blocks or globally, 

and/or the contrast of at least one area of the image, 
and/or the spatial activity SA of at least one area of an 
image or its temporal activity (defined by comparison 
with a previous image), and/or the average brightness of 

25 at least one area of an image. 

The points may be generated from at least two 
characteristic numbers obtained from a single 
characteristic parameter . 

Alternatively, the points may be generated from at 

30 least two characteristic numbers obtained from at least 
two characteristic audio and/or video parameters. 

In the method, the data stream comprises video data 
and audio data and the method effects firstly video 
synchronization based on points Di and D2 associated with 

35 at least one characteristic video parameter corresponding 
to said video stream and secondly audio synchronization 
based on points D''l and D"2 associated with at least one 
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characteristic audio parameter corresponding to said 
audio stream. 

It may then include a step of determining the 
synchronization shift between the video stream and the 
5 audio stream as the difference between said shifts 

determined for the video stream and for the audio stream. 
BRIEF DESCRIPTION OF THE DRAWINGS 
Other features and advantages of the invention will 
become more apparent on reading the description with 
10 reference to the appended drawings^ in which: 

- Figure 1 shows the architecture of a prior art 
system for measuring the quality of an audio signal; 

- Figure 2 depicts the audio signal synchronization 
problem; 

15 - Figure 3 shows an increase in synchronization 

accuracy that may be achieved in the context of the 
present invention; 

- Figure 4 depicts an example of two bidimensional 
trajectories of audio signals to be synchronized in a 

20 situation where r = A/2; 

- Figures 5 and 6 depict two variants of 
synchronization between two trajectories assigned to two 
data streams; 

- Figure 7 is a flowchart of a trajectory-based 
25 synchronization method of the invention; 

- Figures 8 to 10 depict synchronization in 
accordance with the invention when the significant 
parameter is a perceived audio parameter. Figures 10a and 
10b respectively, depicting the situation before and after 

30 synchronization of two trajectories; and 

- Figure 11 depicts a use of a method employing 
autoregressive modeling of the signal with linear 
prediction coefficients as the characteristic parameter. 

MORE DETAILED DESCRIPTION 
35 The first step of the method calculates at least two 

characteristic numbers from one or more characteristic 
parameters over all of the time windows of the signals to 
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be synchronized and over the required synchronization 
period; each number is therefore calculated at intervals 
A (see Figures 2 and 3)^ which yields N = T/A parameters. 
If possible, the number (s) must be simple to calculate^ 
5 so as not to demand excessive calculation power. Each 
characteristic parameter may be of any kind and may be 
represented by a single number, for example. One 
characteristic parameter of the content of an audio 
signal is the bandwidth, for example. 

10 Providing the parameters only at intervals A greatly 

reduces the quantity of data necessary to obtain 
synchronization from the reference signal Sr. However, 
the accuracy of the resulting synchronization is 
necessarily limited; the uncertainty with respect to an 

15 ideal synchronization, i.e. to the nearest signal sample, 
is ±A/2 . If this uncertainty is too great, one 
alternative is to reduce the period A; however, this 
modification is rarely possible since it calls into 
question the calculation of the characteristic number (s) 

20 and increases the quantity of data necessary for 
synchronization . 

In the particular embodiment in which the parameters 
are also used to evaluate quality by comparing the 
parameters Pi and P'l, any synchronization error exceeding 

25 the resolution ro of the parameter will prevent estimation 
of the deterioration introduced (this is Situation A in 
Figure 3) . 

To obtain an arbitrary synchronization accuracy, 
with an uncertainty value r that may be less than A/2, 

30 for example, without increasing the quantity of data 

extracted from the reference signal, the characteristic 
numbers may be calculated with a higher temporal 
resolution. For this purpose, the parameters are 
calculated at intervals r < A from the second signal to 

35 be synchronized (the ^Megraded'' signal) , which 

corresponds to A/r parameters Pi^ for a parameter Pi. The 
calculation complexity increases from T/A to T/r 



13 



calculation windows, but only for the received signal. 
The situation B of Figure 3 illustrates the method used. 
For example, r is a sub-multiple of A. 
Notation 

5 - T: synchronization search period (T is a multiple 

of A) ; 

- ro: maximum permitted synchronization error/ 
uncertainty; 

- e: synchronization error; 

10 - A: period of calculating the parameters from the 

signal; 

- Pk: parameter calculated from the first 
('^reference") signal Sr (k is a temporal index indicating 
to which calculation period A Pk corresponds) ; 

15 - P'k- parameter calculated from the second 

(^Megraded") signal So (k is a temporal index indicating 
to which calculation period A Pk corresponds) ; 

P'k^' parameter calculated from the second 
(''degraded") signal Sd (k is a temporal index 
20 indicating to which calculation period A Pk 

corresponds) ; and 

d_ is a temporal subindex indicating a number of 
periods r from 1 to A/r within the period p. 
Note: All durations correspond to an integer number 
25 of samples of the audio or video signal. 

The second step processes the parameters to define 
one or more coordinates. A set of (3 coordinates is 
calculated for each set of parameters Pk or P'k^ obtained 
over the window k of duration A corresponding to 1024 
30 samples of the reference signal or the degraded signal, 
respectively, for example. 

• The prime aim of this step is to obtain pertinent 
coordinate values for carrying out synchronization, with 
given bounds and limits. Thus each coordinate is 
35 obtained from a combination of available characteristic 
numbers. Moreover, this step reduces the number of 
dimensions and therefore simplifies subsequent 



14 



operations , 

In one preferred embodiment, two coordinates must be 
obtained ((3 = 2). For example, if two characteristic 
parameters are used, each of them may be used to 
5 determine a coordinate. Alternatively, more 

characteristic numbers may be used; processing may be 
carried out to provide fewer numbers, for example two 
coordinates, which are then interpreted as a projection 
from a space with as many dimensions as there are 
10 characteristic numbers to a space with two coordinates, 
for example. 

The third step constructs the trajectory (see Figure 
4). The trajectory defines a signature of a segment of 
the audio signal over the duration T by means of a series 

15 of points in a space with as many dimensions as there are 
coordinates. The use of a space with two or more 
dimensions enables a particular trajectory to be 
constructed, achieving high reliability and high accuracy 
of synchronization . 

20 After these three steps, synchronizing the signals 

amounts to synchronizing two trajectories (or curves 
parametered by time) in a space of two or more 
dimensions: 

- The first trajectory is defined by points Rk 
25 obtained from significant numbers Pk calculated at 

intervals A over the time period T. There are N = T/A 
points Rk. 

- The second trajectory is defined by points Dk = Dk^ 
obtained from significant numbers Pk = Pk^ calculated at 

30 intervals A over the range T. There are N' = N = T/A 
points Dk. 

If a period r < A is used to calculate the 
parameters P'k'^f the trajectory* is defined by the points 
Dk^/ of which there are N' = T/r. 
35 To this end, a criterion of resemblance between two 

trajectories of N points (or of N and N' points) is used. 
The following methods are described by way of example: 
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The first method proposed minimizes a distance 
between the two trajectories. 

The basic idea is to calculate a distance over a 
portion of the trajectory- An appropriate portion of 
5 each trajectory is selected as a function of the maximum 
range of desynchronization of the curves corresponding to 
the audio or video signals. 

Over these portions, a cumulative total Diff of the 
distances d between the peaks Rk and Dk+deita or Dk+deita^ of 
10 the curves is calculated from equations (1) and (2) 

below, respectively, by applying successive shifts delta, 
in order to find the shift minimizing the distance Diff 
between tra j ectories . 



15 with points defined by two coordinates in a space with (3 
= 2 dimensions. For the ^Megraded'' signal, the 
parameters are calculated at intervals r = A/2, i.e. with 
twice the resolution of the first signal. 



20 trajectories. The arithmetic mean of the peak to peak 

distances is preferred, but another distance calculation 
is equally applicable. 



where aD = l..oc, N = T/A and d(A,B) is the distance 
25 between two points or peaks. This distance d(A,B) may 
also have any value. In one particular embodiment, the 
Euclidean distance is used: 



where ad = l..oc, aj and bj are the coordinates of the 
30 points A and B and (3 designates the number of coordinates 
of each point. 

The shift delta giving the minimum distance Diff 
corresponds to resynchronization of the curves and 



Figure 4 depicts the calculation for one example. 



The distance Diff gives the distance between the two 
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consequently of the original signal. In this example 
(Figure 4) the shift is 2, which is twice the initial 
parameter calculation period A. The synchronization 
range will therefore be from: 

5 t + 2*A-— to t + 2*A-h— (3) 

2 2 

The second criterion proposed is maximization of a 
correlation between the two trajectories. 

This criterion works in a similar way to the 
preceding one, except that it maximizes the value Correl. 
10 Equations (1) and (2) are replaced by the following two 
equations: 

Correl(delta) = X * Rk.de.ta ( 4 ) 

k=l 

in which the operator * denotes the scalar product 

defined as follows: 

N 



2: 



a.*b. 
J J 



k=l V k=l 

where aj and bj are the coordinates of the points A and B. 

The following methods are particularly suitable for 
(3 = 2 coordinates. 

Other techniques make the method more robust in the 
20 presence of significant differences between the signals 
to be synchronized, for example caused by deterioration 
during broadcasting, namely: 

• distance between successive angles of the 
trajectories 

25 This method consists in transforming the two- 

dimensional trajectory into a series of angles between 
successive segments defined by the points of the 
trajectory. Figure 5 shows the definition of the angles 
Acp. 
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c+delta 



(6) 



The criterion used for synchronizing the two 
trajectories is minimization of the following equation: 

• intercepted area between the two curves 

5 This method consists in transforming the two- 

dimensional trajectory into a series of areas intercepted 
by successive segments defined by the points of the 
trajectory. Figure 6 shows the definition of the 
intercepted areas S. 
10 The criterion used for synchronizing the two 

trajectories is minimization of the following equation: 

N-1 

Sxotai = sum S Diff(delta) - ^ |S, | ( ^ ^ 

k=l 

• Finally, the simultaneous use of a plurality of 
criteria is possible. Once the value delta of the 

15 resynchronization between the two signals has been 

determined by one of the above methods, the two signals 
may be resynchronized by applying the shift delta to one 
of the signals. Synchronization is obtained to an 
accuracy determined by the rate at which the 

20 characteristic numbers are calculated. 

Figure 7 is a flowchart of a synchronization method. 
If the required accuracy is not achieved, i.e. if 
the synchronization is too '"coarse" for the target 
application, there may be a final step to refine the 

25 preceding result. 

A prior art procedure may be applied to the 
synchronization uncertainty range A or r, which is now 
sufficiently small for the complexity to be acceptable. 
For example, an approach based on correlation in the time 

30 domain may be used, preferably an approach that uses 
marker signals. 

However, this step should be used only in certain 
specific instances because, in the quality measurement 
type of target application, refining the synchronization 
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is generally not necessary since sufficient accuracy is 
achieved. Moreover, as explained above, the prior art 
techniques necessitate the availability of data on the 
signals that is not readily transportable in a complex 
5 and distributed system. 

One particular embodiment of the invention relates 
to an application for monitoring audio quality in a 
digital television broadcast network. In this context, a 
major benefit of the invention is that it achieves 

10 synchronization using data used for evaluating quality, 
as this avoids or minimizes the need to transmit data 
specific to synchronization. 

Diverse characteristic numbers for estimating the 
magnitude of the deterioration introduced on broadcasting 

15 the signal are calculated from the reference signal at 
the input of the network (this refers to ^^reduced 
reference" methods) . The reference numbers Pr are sent 
over a data channel to the quality measurement point, 
characteristic numbers Pm are calculated from the degraded 

20 signal at the measurement point, and quality is estimated 
by comparing the parameters Pr and Pm. They must be 
synchronized for this, on the basis of the characteristic 
parameter (s) used for the reference. 

Quality 'is therefore estimated by comparing the 

25 parameters Pr and Pm/ which must be synchronized for this 
to be possible. 

The principle of objective perceived measurements is 
based on converting a physical representation (sound 
pressure level, level, time and frequency) into a 

30 psychoacoustic representation (sound force, masking 

level, critical times and bands or barks) of two signals 
(the reference signal and the signal to be evaluated) , in 
order to compare them. This conversion is effected by 
modeling the human auditory apparatus (generally by 

35 spectral analysis in the Barks domain followed by 
spreading phenomena) . 

The following embodiment of the method of the 
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invention uses a perceived characteristic parameter known 
as the '^perceived count error''. The novelty of this 
parameter is that it establishes a measurement of the 
uniformity of a window in the audio signal. A sound 
5 signal whose frequency components are stable is 

considered to be uniform. Conversely, ''perfect'' noise 
corresponds to a signal that covers all the frequency 
bands uniformly (flat spectrum) . This type of parameter 
may therefore be used to characterize the content of the 

10 signal. This capacity is reinforced by its perceived 

character, i.e. by taking account of characteristics of 
the human auditory apparatus known from psychoacoustics . 

The steps applied to the reference signal and to the 
degraded signal to take account of psychoacoustics are as 

15 follows: 

• Windowing of the temporal signal in blocks and 
then, for each block, calculating the excitation induced 
by the signal using a hearing model. This representation 
of the signals takes account of psychoacoustic phenomena 

20 and supplies a histogram whose counts are basilar 

component values. Thus only the audible components of 
the signal need to be taken into account, i.e. only the 
useful information. Standard models may be used to 
obtain this excitation: attenuation of the external and 

25 middle ear, integration in physical bands and frequency 

masking. The time windows chosen are of approximately 42 
ms duration (2048 points at 48 kHz) , with a 50% overlap. 
This achieves a temporal resolution of the order of 
21 ms. 

30 Modeling entails a plurality of steps. In the first 

step, the attenuation filter of the external and middle 
ear is applied to the spectral power density obtained 
from the spectrum of the signal. This filter also takes 
account of an absolute hearing threshold. The concept of 

35 critical bands is modeled by conversion from a frequency 
scale to a basilar scale. The next step calculates 
individual excitations to take account of masking 
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phenomena, using the spreading function in the basilar 
scale and non-linear addition. The final step uses a 
power function to obtain the compressed loudness for 
modeling the non-linear frequency sensitivity of the ear 
5 by a histogram comprising 109 basilar components. 

• The counts of the histogram obtained are then 
periodically vectored in three classes to obtain a 
representation along a trajectory that is used to 
visualize the evolution of the structure of the signals 

10 and for synchronization. This also yields a simple and 
concise characterization of the signal and thus provides 
a reference parameter (or characteristic parameter) . 

There are various strategies for fixing the limits 
of the three classes; the simplest divides the histogram 

15 into three areas of equal size. Thus the 109 basilar 

components^ which represent 24 Barks, may be separated at 
the following indices: 

IS, = 36 i.e. z = — * 36 = 7.927 Barks ( 8 ) 

'109 

24 

IS2 = 73 i.e. z = — * 73 = 16.073 Barks ( 9 ) 

20 The second strategy takes account of the BEERENDS 

scaling areas. This corresponds to compensation of the 

gain between the excitation of the reference signal and 

that of the signal under test by considering three areas 

in which the ear would perform this same operation. Thus 

25 the limits set are as follows: 

24 

IS, =9i.e.z = *9 = 1.982 Barks (10) 

109 

IS2 =100 i.e.z = — * 100 = 22.018 Barks (11) 
'109 ^ 

The trajectory is then represented in a triangle 

known as the frequency triangle. For each block three 

30 counts Ci, C2 and C3 are obtained, and thus two Cartesian 

coordinates, conforming to the following equations: 

X = C,/N + ^^ (12) 
Y = C2/N*sin(7t/3) (13) 
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where Ci is the sum of the excitations for the high 
frequencies (components above S2) , 

C2 is the count associated with the medium 
frequencies (components from Si to S2) r and 
5 N = Ci + C2 + C3 is the total sum of the values of 

the components. 

A point (X, Y) is therefore obtained for each 
temporal window of the signal. Each of the coordinates X 
and Y constitutes a characteristic number. 
10 Alternatively^ Ci, C2 and C3 may be taken as 
characteristic numbers . 

For a complete sequence^ the associated 
representation is therefore a trajectory parametered by 
time, as shown in Figure 8. 
15 Of the various methods available for synchronizing 

the trajectories, the technique chosen by way of example 
is that based on minimizing the distance between points 
on the trajectories. 

It is important to note that the calculation of the 
20 parameter for the synchronization used in this case 

remains complex, but that this parameter may also be used 
to estimate the quality of the signal. It must therefore 
be calculated anyway, and this is therefore not an 
additional calculation load at the time of the 
25 comparison, especially as the calculation relating to 

• this parameter is effected locally only for the received 
digital stream. 

Figure 9 summarizes the m.ethod used to synchronize 
the signals in the context of monitoring the quality of 
30 broadcast signals using the. above characteristic 
parameter . 

The following example illustrates the case of a 
reference file (Rl) which is iyiPEG2 coded and decoded at 
128 kbit/s, yielding a degraded file (R2) . The 
35 resynchronization introduced is 6000 samples. The shift 
found is six windows, i.e. 6*1024 = 6144 samples. The 
error (144) is much less than the period (1024) of the 
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characteristic parameter. Figures 10a and 10b show the 
trajectories before and after synchronization. 

Before synchronization (Figure 10a) , there is no 
point to point correspondence between the two 
5 trajectories. After synchronization (Figure 10b), the 

correspondence between the two trajectories is optimized 
in terms of the distance criterion (cf. equation (1)). 

More refined synchronization is generally not 
needed, especially if the uncertainty resulting from the 
10 procedure explained here is less than the maximum 
synchronization error permitted by the quality 
measurement parameter. For more demanding quality 
parameters, the necessary resolution ro is of the order of 
32 samples. 

15 In Figure 10a, the original range is of the order of 

120 ms, i.e. 5760 samples at 48 kHz. Using only the 
characteristic numbers available for the evaluation of 
quality (every 1024 samples, i.e. every A), a first 
synchronization is carried out with an uncertainty of 

20 1024 samples, which is better by a factor of 5 compared 
to 5760, for a calculation power dedicated to very 
limited synchronization . 

However, in a second step, for example, more 
frequent calculation of the quality parameters for the 

25 second (degraded) signal (r < A) enables the 

synchronization error to be further reduced to r samples, 
if required. 

Another characteristic parameter uses autoregressive 
modeling of the signal. 

30 The general principle of linear prediction is to 

model a signal as a combination of its past values. The 
basic idea is to calculate the N coefficients of a 
prediction filter by autoregressive (all pole) modeling. 
It is possible to obtain a predicted signal from the real 

35 signal using this adaptive filter. The prediction or 
residual errors are calculated from the difference 
between these two signals. The presence and the quantity 
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of noise in a signal may be determined by analyzing these 
residues . 

The magnitude of the modifications and defects 
introduced may be estimated by comparing the residues 
5 obtained for the reference signal and those calculated 
from the degraded signal. 

Because there is no benefit in transmitting all of 
the residues if the bit rate of the reference is to be 
reduced, the reference to be transmitted corresponds to 
10 the maximum of the residues over a time window of given 
size . 

Two methods of adapting the coefficients of the 
prediction filter are described hereinafter by way of 
example : 

15 - The LEVINSON-DURBIN algorithm, which is described, 

for example, in ^^Traitement numerique du signal - Theorie 
et pratique'' [^^Digital signal processing - Theory and 
practice''] by M. BELLANGER, MASSON, 1987, pp. 393 to 395. 
To use this algorithm, an estimate is required of the 

20 autocorrelation of the signal over a set of No samples. 
This autocorrelation is used to solve the Yule-Walker 
system of equations and thus to obtain the coefficients 
of the prediction filter. Only the first N values of the 
autocorrelation function may be used, where N designates 

25 the order of the algorithm, i.e. the number of 

coefficients of the filter. The maximum prediction error 
is retained over a window comprising 1024 samples. 

- The gradient algorithm., which is also described in 
the above-mentioned book by M. BELLANGER, for example, 

30 starting at page 371. The main drawback of the preceding 
parameter is the necessity, in the case of a DSP 
implementation, to store the No samples in order to 
estimate the autocorrelation, together with the 
coefficients of the filter, and then to calculate the 

35 residues. The second parameter avoids this by using 

another algorithm to calculate the coefficients of the 
filter, namely the gradient algorithm, which uses the 
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error that has occurred to update the coefficients. The 
coefficients of the filter are modified in the direction 
of the gradient of the instantaneous quadratic error, 
with the opposite sign. 
5 When the residues have been obtained from the 

difference between the predicted signal and the real 
signal, only the maximum of their absolute values over a 
time window of given size T is retained- The reference 
vector to be transmitted can therefore be reduced to a 

10 single number. 

After transmission followed by synchronization, 
comparison consists in simply calculating the distance 
between the maxima of the reference and the degraded 
signal, for example using a difference method. 

15 Figure 5 summarizes the parameter calculation 

principle : 

The main advantage of the two parameters is the bit 
rate necessary for transferring the reference. This 
reduces the reference to one real number for 1024 signal 
20 samples. 

However, no account is taken of any psychoacoustic 

model . 

Another characteristic parameter uses autoregressive 
modeling of the basilar excitation. 

25 In contrast to the standard linear prediction 

method, this method takes account of psychoacoustic 
phenomena in order to obtain an evaluation of perceived 
quality. For this purpose, calculating the parameter 
entails modeling diverse hearing principles. Linear 

30 prediction models the signal as a combination of its past 
values. Analysis of the residues (or prediction errors) 
determines the presence of noise in a signal and 
estimates the noise. The major drawback of these 
techniques is that they take no account of psychoacoustic 

35 principles. Thus it is not possible to estimate the 
quantity of noise actually perceived. 

The method uses the same general principle as 
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standard linear prediction and additionally integrates 
psychoacoustic phenomena in order to adapt to the non- 
linear sensitivity of the human ear in terms of frequency 
(pitch) and intensity (loudness). 
5 The spectrum of the signal is modified by means of a 

hearing model before calculating the linear prediction 
coefficients by autoregressive (all pole) modeling. The 
coefficients obtained in this way provide a simple way to 
model the signal taking account of psychoacoustics . It 

10 is these prediction coefficients that are sent and used 
as a reference for comparison with the degraded signal. 

The first part of the calculation of this parameter 
models psychoacoustic principles using the standard 
hearing models. The second part calculates linear 

15 prediction coefficients. The final part compares the 
prediction coefficients calculated for the reference 
signal and those obtained from the degraded signal. The 
various steps of this method are therefore as follows: 

- Time windowing of the signal followed by 

20 calculation of an internal representation of the signal 
by modeling psychoacoustic phenomena. This step 
corresponds to the calculation of the compressed 
loudness, which is in fact the excitation in the inner 
ear induced by the signal. This representation of the 

25 signal takes account of psychoacoustic phenomena and is 
obtained from the spectrum of the signal^ using the 
standard form of modeling: attenuation of the external 
and middle ear, integration in critical bands, and 
frequency masking; this step of the calculation is 

30 identical to the parameter described above; 

- Autoregressive modeling of the compressed loudness 
in order to obtain the coefficients of an RIF prediction 
filter, exactly as in standard linear prediction; the 
method used is that of autocorrelation by solving the 

35 Yule-Walker equations; the first step for obtaining the 
prediction coefficients is therefore calculating the 
autocorrelation of the signal. 
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It is possible to calculate the perceived 
autocorrelation of the signal using an inverse Fourier 
transform by considering the compressed loudness as a 
filtered spectral power. 
5 One method of solving the Yule-Walker system of 

equations and thus of obtaining the coefficients of a 
prediction filter uses the Levinson-Durbin algorithm. 

It is the prediction coefficients that constitute 
the reference vector to be sent to the comparison point. 

10 The transforms used for the final calculations on the 

degraded signal are the same as are used for the initial 
calculations applied to the reference signal. 

- Estimating the deterioration by calculating a 
distance between the vectors from the reference and from 

15 the degraded signal. This compares coefficient vectors 
obtained for the reference and for the transmitted audio 
signal, enabling the deterioration caused by transmission 
to be estimated, using an appropriate number of 
coefficients. The higher this number, the more accurate 

20 the calculations, but the greater the bit rate necessary 
for transmitting the reference. A plurality of distances 
may be used to compare the coefficient vectors. The 
relative size of the coefficients may be taken into 
account, for example. 

25 The principle of the method may be as summarized in 

the Figure 11 diagram. 

Modeling psychoacoustic phenomena yields 24 basilar 
components. The order N of the prediction filter is 32. 
From these components, 32 autocorrelation coefficients 

30 are estimated, yielding 32 prediction coefficients, of 
which only 5 to 10 are retained as a quality indicator 
vector, for example the first 5 to 10 coefficients. 

The main advantage of this parameter is that it 
takes account of psychoacoustic phenomena. To this end, 

35 it has been necessary to increase the bit rate needed to 
transfer the reference consisting of 5 or 10 values for 
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1024 signal samples (21 ms for an audio signal sampled at 
48 kHz), that is to say a bit rate of 7.5 to 15 kbit/s. 

The characteristic parameter P may generally be any 
magnitude obtained from the content of the digital 
5 signals, for example, in the case of video signals: 

- the brightness of the image or of an area thereof 
as given by the continuous coefficients F(0,0) of the 
discrete cosine transform of the image, or any other 
transform by blocks^ linear and orthogonal, by blocks or 

10 global, and/or 

- the contrast of the image or of an area thereof, 
obtained by applying a Sobel filter, for example, and/or 

- the activity SA of the image as defined, for 
example, in the Applicant's application PCT WO 99/18736, 

15 and obtained by a transformation by blocks linear and 

orthogonal (discrete cosine transform, Fourier transform,. 
Haar transform, Hadamard transform, slant transform, 
wavelet transform, etc.), 

- the average of the image, 

20 and in the case of audio signals: 

- the power, and/or 

- the spectral power density as defined in French 
Patent Application FR 2 769 777 filed 13 October 1997, 
and/or one of the parameters described above. 

25 It will be noted that the parameter P may be 

degraded by transmission, but in practice it is found 
that synchronization may be obtained by the method of the 
invention at the levels of deterioration generally 
encountered in transmission networks. 

30 As a general rule, once synchronization has been 

acquired, the method may be used to verify that it has 
been retained, in order to be able to remedy disturbances 
such as bit stream interruptions, changes of bit stream, 
changes of decoder, etc., as and when required, by 

35 desynchronizing the two digital signals E and S. 

The method described is applicable whenever it is 
necessary to synchronize two digital streams. The method 
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yields a first synchronization range that is sufficiently 
narrow to allow the use of standard real time fine 
synchronization methods. 

The method advantageously exploits one or more 
5 parameters characteristic of the signals to be 

synchronized that are represented by at least two 
characteristic numbers, instead of all of the signals. 
In a preferred embodiment, the combined use of a 
plurality of parameters achieves more reliable 

10 synchronization than the prior art techniques. Moreover, 
the invention achieves synchronization at a chosen level 
of accuracy and with less complexity than existing 
methods. This form of synchronization delimits an error 
range with a duration allowing subsequent use of standard 

15 ^''fine" synchronization methods if higher accuracy is 
required. 

One particular application of measuring equipment 
for implementing the method of the invention is 
monitoring the quality of signals delivered by 

20 audiovisual digital signal broadcasting networks. 

The invention also provides sound and picture 
synchronization for a data stream incorporating audio and 
video data. To this end, video synchronization is 
effected by calculating a video synchronization shift and 

25 audio synchronization is effected by calculating an audio 
synchronization shift. Moreover, it is possible to 
determine if an offset between the sound and the picture 
has occurred during transmission by comparing the values 
of the two shifts, for example. 



